Skip to content

Conversation

@keyprocedure
Copy link
Contributor

@keyprocedure keyprocedure commented Jul 29, 2025

Summary

This is PR 3 of 3 implementing a dim order aware clone op.

This PR updates the clone removal pass to retain layout changing aten.clone and _clone_dim_order ops and remove no-op clones, ensuring layout/dim order is preserved through export.

Related PRs:

  • PR 1: #12974 - Add _clone_dim_order portable kernel
  • PR 2: #13735 - Register _clone_dim_order op and map aten.clone

Fixes #12645

Test plan

Added tests to verify:

  • Clones that change layout are preserved.
  • Clones with unchanged layout (identity ops) are removed.

All tests pass via:
python -m unittest exir.tests.test_memory_format_ops_pass
python -m unittest backends.transforms.test.test_remove_clone_ops

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12976

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 51 Pending, 1 Unrelated Failure

As of commit 15ff154 with merge base 29cec35 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

  • trunk / test-llama-torchao-lowbit / macos-job (gh) (trunk failure)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 29, 2025
@keyprocedure
Copy link
Contributor Author

@pytorchbot label "release notes: none"

@pytorch-bot pytorch-bot bot added the release notes: none Do not include this in the release notes label Jul 29, 2025
Gasoonjia added a commit that referenced this pull request Aug 11, 2025
### Summary
This is PR 1 of 3 implementing a dim order aware clone op. 

Currently, clone ops are removed during export as no-ops, causing memory
layout (dim order) changes to be lost. This can cause backend failures,
incorrect outputs when ops expect specific layouts, and performance
degradation. This set of PRs introduces a dim order aware clone op,
`_clone_dim_order`, which preserves memory layout changes by explicitly
storing dim order information. This is implemented by replacing standard
clone ops with this variant during export and updating the clone removal
transform to preserve clones that change layout.

This PR adds the portable CPU kernel for the `_clone_dim_order` op,
implementing a clone variant that preserves dim order at runtime. The
portable kernel validates dtype and layout compatibility, resizes the
output tensor if needed, and performs an element wise clone of the
tensors.

Note: A future PR will add the ATen kernel for `_clone_dim_order`.

Related PRs:
- PR 2: [#12971](#12971) -
Register `_clone_dim_order` op and map `aten.clone`
- PR 3: [#12976](#12976) -
Update RemoveCloneOpsTransform to be dim_order aware

Fixes #12645 

### Test plan
Added kernel runtime tests to verify:
- Tensors of all real dtypes are cloned correctly.
- Failure when input and output tensor shapes mismatch.
- Failure with unsupported memory formats.
- Failure when `non_blocking=true` since the portable kernel only
supports blocking data transfer.
- Dynamic shape outputs are cloned with correct values.
- Layout conversions are cloned correctly for `contiguous` to
`channels_last`, `channels_last` to `contiguous`, and `channels_last` is
preserved.

All runtime tests pass via:
`build-ninja/kernels/test/portable_kernels_test`

---------

Co-authored-by: Gasoonjia <[email protected]>
if node.op != "call_function":
continue

# Identify clone_dim_order ops with unchanged memory layout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are supporting aten.clone elimination through this pass then we should similarly check memory_format arg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! I added the check for aten.clone and updated the tests. I'll refactor/simplify the test cases if needed once we land the AOT PR since it includes its own tests.

agrima1304 pushed a commit to agrima1304/executorch that referenced this pull request Aug 26, 2025
### Summary
This is PR 1 of 3 implementing a dim order aware clone op. 

Currently, clone ops are removed during export as no-ops, causing memory
layout (dim order) changes to be lost. This can cause backend failures,
incorrect outputs when ops expect specific layouts, and performance
degradation. This set of PRs introduces a dim order aware clone op,
`_clone_dim_order`, which preserves memory layout changes by explicitly
storing dim order information. This is implemented by replacing standard
clone ops with this variant during export and updating the clone removal
transform to preserve clones that change layout.

This PR adds the portable CPU kernel for the `_clone_dim_order` op,
implementing a clone variant that preserves dim order at runtime. The
portable kernel validates dtype and layout compatibility, resizes the
output tensor if needed, and performs an element wise clone of the
tensors.

Note: A future PR will add the ATen kernel for `_clone_dim_order`.

Related PRs:
- PR 2: [pytorch#12971](pytorch#12971) -
Register `_clone_dim_order` op and map `aten.clone`
- PR 3: [pytorch#12976](pytorch#12976) -
Update RemoveCloneOpsTransform to be dim_order aware

Fixes pytorch#12645 

### Test plan
Added kernel runtime tests to verify:
- Tensors of all real dtypes are cloned correctly.
- Failure when input and output tensor shapes mismatch.
- Failure with unsupported memory formats.
- Failure when `non_blocking=true` since the portable kernel only
supports blocking data transfer.
- Dynamic shape outputs are cloned with correct values.
- Layout conversions are cloned correctly for `contiguous` to
`channels_last`, `channels_last` to `contiguous`, and `channels_last` is
preserved.

All runtime tests pass via:
`build-ninja/kernels/test/portable_kernels_test`

---------

Co-authored-by: Gasoonjia <[email protected]>
Gasoonjia added a commit that referenced this pull request Aug 26, 2025
### Summary
This is PR 2 of 3 implementing a dim order aware clone op. 

This PR registers the new `_clone_dim_order` op and maps `aten.clone`
ops to `dim_order_ops._clone_dim_order` in EXIR during export to
preserve memory layout changes (contiguous/channels_last). It also
updates Core ML and ARM backends to handle the new clone op.

Related PRs:
- PR 1: [#12974](#12974) - Add
`_clone_dim_order` portable kernel
- PR 3: [#12976](#12976) -
Update RemoveCloneOpsTransform to be dim order aware

Fixes #12645  

### Test plan
- Operator level tests to verify kernel behavior for layout preservation
and changes.
- Graph level checks to confirm that clone mapping occurs.
- End to end tests to validate that functional clone behavior is
unchanged.

All tests pass via:
`python -m unittest exir.tests.test_memory_format_ops_pass`
`python -m unittest backends.apple.coreml.test.test_torch_ops`
`pytest backends/arm/test/ops/test_clone.py`
`pytest backends/arm/test/passes/test_remove_clone_pass.py`

---------

Co-authored-by: Gasoonjia <[email protected]>
Co-authored-by: Digant Desai <[email protected]>
Gasoonjia added a commit that referenced this pull request Sep 4, 2025
### Summary
This is PR 2 of 3 implementing a dim order aware clone op. 

This PR registers the new `_clone_dim_order` op and maps `aten.clone` to
`dim_order_ops._clone_dim_order` in EXIR during export to preserve
memory layout changes (contiguous/channels_last). It also updates the
Core ML, ARM, and Qualcomm backends to handle the new clone op.

Related PRs:
- PR 1: [#12974](#12974) - Add
`_clone_dim_order` portable kernel
- PR 3: [#12976](#12976) -
Update RemoveCloneOpsTransform to be dim order aware

Fixes #12645  

### Test plan
- Operator level tests to verify kernel behavior for layout preservation
and changes.
- Graph level checks to confirm that clone mapping occurs.
- End to end tests to validate that functional clone behavior is
unchanged.
- Backend tests to ensure clone semantics are preserved.

All tests pass via:
`python -m unittest exir.tests.test_memory_format_ops_pass`
`python -m unittest backends.apple.coreml.test.test_torch_ops`
`pytest backends/arm/test/ops/test_clone.py`
`pytest backends/arm/test/passes/test_remove_clone_pass.py`

---------

Co-authored-by: Gasoonjia <[email protected]>
Co-authored-by: Digant Desai <[email protected]>
@keyprocedure keyprocedure marked this pull request as ready for review September 4, 2025 20:12

to_be_remove = n
# Skip removal of clone ops that modify layout/dim order.
if self.aten_clone_is_non_identity(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UFMT formatter forces this split style

@keyprocedure
Copy link
Contributor Author

@Gasoonjia this PR is ready to go, would you mind reviewing when you get a chance?

Copy link
Contributor

@Gasoonjia Gasoonjia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @keyprocedure for your help!

Some subtle feedback but overall look great!

dead_code_elimination_pass(graph_module)
return PassResult(graph_module, True)

def aten_clone_is_non_identity(self, node: torch.fx.Node) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's combine the two check functions, aten_clone_is_non_identity and _clone_dim_order_is_non_identity into a single function (maybe called _is_non_identity_clone). Under current scenario we will always use the funcs together and this func should be private.

self.assertTrue(is_contiguous_dim_order(actual))
self.assertTrue(is_contiguous_dim_order(expected))

def test_op_clone_replacement_channels_last_survives(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets move test to test_remove_clone_ops.py

Copy link
Contributor

@Gasoonjia Gasoonjia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thansk for your wonderful work!
Only a name update suggestion but i think we can stamp it as long as ci passes!

transformed_gm.code
)

def test_clone_channels_last_survives(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should called it test_clone_non_identity_survives? cuz it survives because of mutating memory_format / dim_order, rather than cloning a channels_last tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, that makes more sense

@keyprocedure
Copy link
Contributor Author

LGTM! Thansk for your wonderful work! Only a name update suggestion but i think we can stamp it as long as ci passes!

No problem!
Should I push the name change now or would you rather we do it in another PR?

@Gasoonjia
Copy link
Contributor

LGTM! Thansk for your wonderful work! Only a name update suggestion but i think we can stamp it as long as ci passes!

No problem! Should I push the name change now or would you rather we do it in another PR?

Let's change it in this PR if you have time!

@pytorch-bot pytorch-bot bot removed the ciflow/trunk label Sep 5, 2025
@keyprocedure
Copy link
Contributor Author

Done :)

@mergennachin
Copy link
Contributor

@keyprocedure we might have to add the tests here: https://github.com/pytorch/executorch/blob/main/pytest.ini

@pytorch-bot pytorch-bot bot removed the ciflow/trunk label Sep 5, 2025
@keyprocedure
Copy link
Contributor Author

@mergennachin good catch, I've added the test file to the pytest config.

@keyprocedure
Copy link
Contributor Author

Hi @Gasoonjia, the CI failures are unrelated to this PR:

  • Documentation upload error
  • NXP backend: Missing out variants: {'quantized_decomposed::quantize_per_tensor', 'quantized_decomposed::dequantize_per_tensor'}
  • Config: Error merging override base.model_class=qwen2_5

Feel free to let me know if you'd like anything in this PR changed, otherwise this should be good to go

@Gasoonjia
Copy link
Contributor

Thanks for your help @keyprocedure ! I think nxp issue should be solved already. Let me rebase and retest.

@Gasoonjia
Copy link
Contributor

Hi @keyprocedure ci looks good but there's a conflict in pytest.ini. Mind take a look?
Also #13735 raised some issue in our internal ci and i have a fix here #14088. I will land this PR after the fix PR land.

@pytorch-bot pytorch-bot bot removed the ciflow/trunk label Sep 8, 2025
@keyprocedure
Copy link
Contributor Author

keyprocedure commented Sep 8, 2025

@Gasoonjia, the merge conflict is resolved: backend/transforms had been added to the pytest config, so I removed the explicit backends/transforms/test/test_remove_clone_ops.py path.
Thanks for taking care of the internal CI issues!

@Gasoonjia
Copy link
Contributor

#14088 has been landed but this ci raised some issue. Looks like it is preexisted in our codebase. Will rebase and retrigger ci.

@pytorch-bot pytorch-bot bot removed the ciflow/trunk label Sep 9, 2025
@Gasoonjia Gasoonjia merged commit dbac09c into pytorch:main Sep 9, 2025
239 of 278 checks passed
@keyprocedure
Copy link
Contributor Author

keyprocedure commented Sep 9, 2025

I appreciate your reviews across all the PRs @Gasoonjia, and for handling CI and merges.

@Gasoonjia
Copy link
Contributor

@keyprocedure thank you for such active contribution! Looking forward to work with you in the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add dim order variant clone operator

4 participants